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1. INTROOUaiON 


Significant research effort has been devoted to the development of an improved 
crop area estimation procedure. This procedure would be a replacement for 
Procedure 1, which was used extensively for crop area estimation in the Large 
Area Crop Inventory Experiment (LACIE) at the National Aeronautics and Space 
Administration, Lyndon B. Johnson Space Center (NASA/JSC). 

Ir view of the deficiencies of Procedure 1 (ref.), the goal of this research 
has been to develop a procedure which is efficient in the sense of having a 
small mean squared error relative to simple random sampling and which, at the 
same time, uses a minimum number of labeled pixels. These two goals are in a 
sense complementary. An efficient procedure is one which obtains a specified 
acceptable variance with a minimum number of labeled or training pixels. 

2. aUSTER-BASEO PROPORTION ESTIMATION 

As a result of evaluations of Procedure 1 by Jess Carnes (ref.), it became 
clear at the beginning of the development effort that the classification which 
followed the clustering in Procedure 1 did not significantly improve the 
stratification of the scene. Thus, from the outset, cluster-based procedures 
were developed. That is, the candidate procedures were of the stratified 
sampling variety, where the strata would be obtained by using an unsupervised 
clustering procedure. This approach had the advantage of eliminating the 
type 1 dots used for initiating and labeling the clusters in Procedure 1. In 
addition, stratifying with clusters was expected, on theoretical grounds, to 
be more efficient than stratifying with the two strata produced by the 
classifier. 

In order to begin development of the procedure, it was necessary to choose an 
unsupervised clustering algorithm. Three algorithms, the Iterative Self- 
Organizing Clustering System (ISOCLS), the Texas A&M University-developed 
program (AMOEBA), and the CLASSY program, were tested by applying them to 21 
LACIE Phase III blind sites and evaluating the average purity of the resultant 
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clusters and the theoretical reduction in variance for the stratification. 
These evaluations were made using the ground-truth label for every pixel in 
the image. A complete statement of the results of this is found in appendix 
A. The basic finding was that the average performance for the three 
algorithms tested was remarkably similar. The only significant difference was 
in the number of clusters generated. CLASSY generated an average of about 9, 
AMOEBA had an average of about 17, and ISOCLS had about 37 clusters. It was 
concluded that the similarity of performance probably indicated that a limit 
had been reached in the separability of the data. The fact that this parallel 
performance was obtained with very few clusters was seen as an advantage for 
the CLASSY and AMOEBA algorithms. 

The next stage in the development was to test each of the candidate clustering 
algorithms in combination with various schemes for forming proportion 
estimates. Six different proportion estimation techniques were chosen for 
testing. Three of these were techniques which resulted in the labeling of 
entire clusters. They may be described as (1) proportional allocation 
followed by majority-rule labeling, (2) a sequential allocation technique for 
labeling with a fixed degree of confidence, and (3) a Bayesian sequential 
technique for labeling with a fixed degree of confidence. Three techniques 
for stratified proportion estimation using clusters as the strata were also 
tested. They may be described as (4) proportional allocation followed by 
stratified proportion estimation, (5) a sequential allocation technique for 
minimizing the estimated mean squared error of the proportion estimate at each 
step, and (6) a Bayesian sequential allocation technique for minimizing the 
estimated mean squared error of the proportion estimate of each step. Each of 
these techniques is described in detail in appendix A. 

The evaluation involved testing each of these six techniques in combination 
with each of the three clustering algorithms. Each combination of clustering 
algorithm and estimation technique was used with 100 different psuedorandom 
allocations of ground-truth-labeled pixels for each segment. Initially, each 
technique was evaluated using five segments. Promising techniques were subse- 
quently evaluated using all of the 21 Phase III blind sites used in evaluating 
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the clustering algorithms. The results of this study, as presented In 
appendix A, were that only two of the techniques appeared to perform 
consistently better than simple random sampling. These were proportional 
allocation followed by stratified proportion estimation and sequential 
Bayesian allocation for minimizing the mean squared error of the stratified 
proportion estimate at each step. The proportional allocation technique had a 
reduction In mean squared error over simple random sampling of about 0.65 for 
each of the three clustering algorithms. The Bayesian sequential allocation 
technique had a reduction 1n mean squared error of about 0.51 for CLASSY and 
ISOCLS and about 0.73 for AMOEBA. Because the aASSY program generated many 
fewer clusters than ISOCLS, it was possible to estimate the purity of each 
cluster using a much smaller number of total labeled pixels. Hence, the 
Bayesian sequential stratified proportion estimate, using CLASSY clusters as 
the strata, emerged as the best technique of those tested with respect to the 
goals of this study. 

3. ANALYST LABELING WITH BAYESIAN SEQUENTIAL TECHNIQUE 

Because all of the preliminary testing had been done with ground-truth labeled 
pixels. It was desirable to test this new Bayesian sequential technique with 
CLASSY clusters as the strata using analyst- Interpreter (AI) labels. This was 
the focus of the second study, which Is reported In appendix B. In this test, 
each of 10 LACIE Phase III blind sites was evaluated using the Bayesian 
sequential procedure. A total of 45 Al-labeled dots were allocated to each 
segment. The result was that the Bayesian sequential procedure performed 
significantly better than either Procedure 1 or simple random sampling. The 
reduction In mean squared error adjusted for sample size was approximately 
0.51 for the Bayesian sequential technique compared to simple random sampling 
and approximately 0.29 for the Bayesian sequential technique compared to 
Procedure 1. In addition, the Bayesian sequential procedure obtained a lower 
average bias than either simple random sampling or Procedure 1. This led to 
the Investigation of the AI error rate on the sequentially labeled pixels 
versus the pixels allocated as random samples. In each of the segments 
tested, the AI error rate for small-grain pixels was lower for the dots 
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allocated using the Bayesian sequential technique. This phenomenon appears to 
be due to the influence of the prior distribution on cluster purities used in 
the Bayesian scheme. In effect, the prior distribution considers pure small- 
grain clusters to be fairly rare. Hence, if they occur, they are sampled more 
heavily to verify their reliability. Since pure small-grain clusters are more 
accurately labeled, this reduces the overall AI error rate. 

4. CONaUSION 

Based on the results of tests using both ground-truth and AI labels, it is the 
conclusion of these studies that stratified proportion estimation using CLASSY 
clusters as the strata and Bayesian sequential allocation as the allocation 
and estimation technique for minimizing the mean squared error of the propor- 
tion estimate offers significant advantages over Procedure 1. It is the 
recommendation of these studies that this new technique be considered as a 
replacement for Procedure 1 and further tested in a semi operational 
environment. 


5. REFERENCE 
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1. BACKGROUND AND INTRODUCTION 

In performing machine classification of remotely sensed data, clustering has 
typically been used to analyze and determine the Inherent data signatures. In 
the proportion estimation system developed during the Large Area Crop Inventory 
Experiment (LACIE) and called Procedure 1, the multlspectral land satellite 
(Landsat) data was first clustered to obtain the spectral signatures. These 
signatures were then labeled and used to train a maximum likelihood classifier 
which classified each picture element (pixel) in the Image Into one of the 
labeled classes. The final step was to evaluate the performance of this clas- 
sifier on an Independent labeled data set and to use the estimates of the 
omission and commission errors resulting from this evaluation to correct the 
bias In the classified data. Procedure 1, thus, required two sets of labeled 
data. A set of approximately 40 labeled pixels, called type 1 dots, was used 
to Initiate the clustering aiid to label the resulting clusters. Another set 
of approximately 60 labeled pixels, called type 2 dots, was used to evaluate 
the classifier and correct any bias In the overall proportion estimates for 
the labeled classes. 

Within the past year, different Investigations have resulted In several Impor- 
tant conclusions regarding the Procedure 1 system. One study (ref. 1) con- 
cluded that the labeled clusters. agreed very closely with corresponding 
classifier results. This seems to Imply that the classification Is unnecessary. 
In a second series of studies (refs. 2 and 3), It was found that the overall 
variance of the proportion estimates, resulting from Procedure 1, were only 
smaller by a factor of about 0.7 (on the average) than the proportion estimates 
resulting from a simple random sample of 60 labeled pixels. The conclusion was 
that the machine processing, which comprised Procedure 1, was relatively 
inefficient. 

The current study was designed as a response to the observed deficiencies In 
Procedure 1. It appeared that the classification step was unnecessary and 
that a more efficient procedure would be to simply cluster the data using a 
completely unsupervised clustering algorithm and then use any labeled pixels 
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to either label the resulting clusters directly or to perform a stratified 
estimate using the clusters as the strata. Such an approach would have the 
advantage of eliminating the need for the type 1 dots as well as the machine 
classification step. 

Since clustering was to be the primary machine processing step in the new 
procedure, It was Important to choose the most efficient clustering algorithm 
available. Three algorithms were ultimately chosen for testing. These algo- 
rithms were: 

a. CLASSY (refs. 4, 5, and 6) - an adaptive maximum likelihood algorithm 
developed at the National Aeronautics and Space Administration (NASA), 
Lyndon B. Johnson Space Center (JSC) 

b. AMOEBA (ref. 7) - an algorithm developed at Texas ASM University, 
employing both spectral and spatial Information 

c. The Iterative Self-Organizing Clustering System (ISOCLS), (ref. 8) - a 
variant of the ISOOATA algorithm of Ball and Hall (ref. 9), and the algo- 
rithm used In Procedure 1 

These algorithms were applied to each of 25 LACIE segments collected during 
the 1976-77 crop year. The details of the clustering algorithms and the meas- 
ures used In evaluating the clustering results are discussed In section 2 of 
this report. 

An equally Important part of defining a new proportion estimation procedure 
was the selection of a scheme for obtaining a stratified estimate or a method 
of labeling each cluster. In this regard, three stratified estimation schemes 
and three labeling schemes were considered. The details of these schemes are 
described In section 3. A description of the data set and the experimental 
design Is Included In section 4. In section 5 Is a summary of the primary 
results, and section 6 consists of the conclusions drawn from the observed 
results with appropriate recommendations. 
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2. CLUSTERING ALGORITHMS AND EVALUATION CRITERIA 

The clustering evaluation portion of the study consisted of running each of 
three different clustering algorithms on each of the 25 LACIE segments selected. 
The clustering algorithms tested were CLASSY, AMOEBA, and ISOCLS. 

CLASSY was run using three complete passes through the data where the data set 
consisted of every other pixel in the image. Clusters smaller than 2 percent 
of the scene were eliminated. 

ISOCLS was run with the standard iterative parameter set recommended by Wylie 
and Bean (ref. 10) and known as the MPAD cluster parameter set. The values 
of these parameters are given in table 2-1. The algorithm was started with 
40 randomly selected and unlabeled pixels from each image. 

AMOEBA was run with parameters specified by its developers at Texas ASM Uni- 
versity. The minimum number of clusters was set at five. 

Both CLASSY and AMOEBA were run on data which had been transformed to Kauth 
brightness and greenness coordinates on each pass (ref. 11). This reduced the 
dimensionality of the data by a factor of 2. ISOCLS was run on the full dimen- 
sional data in accordance with the standard practice during LACIE Phase III. 

Each of the algorithms tested produced cluster maps which were subsequently 
compared with digitized ground-truth maps. The ground- truth maps were pre- 
pared from ground-truth images having a resolution six times that of Landsat 
imagery. The higher resolution ground truth was converted to Landsat resolu- 
tion by applying majority rule to each six-subpixel area corresponding to one 
Landsat pixel. In the event of ties, the first label to receive the tying 
number of subpixels was chosen as the Landsat pixel label. 

By comparing the digitized ground truth with a cluster image, the proportion 
of each ground-truth class, making up each cluster, was determined. The pro- 
portions for the small-grains classes were then combined to give the proportion 
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TABLE 2-1.-MPA0 CLUSTER PARAMETER SET 


Parameter 

Number of channels 

8 

12 

16 

CLUSTERS 

60.0 

60.0 

60.0 

THRESHOLD 

8191 

8191 

8191 

SEP 

1 

1 

1 

PERCENT 

100 

90 

90 

STOMAX 

3.6 

3.6 

3.6 

OLMIN 

3.9 

4.1 

4.5 

NMIN 

50 

50 

50 

ISTOP 

8 

8 

8 

SEQUEN 

Spl 1 t- 
combine 

Split- 

combine 

Spl i t- 
combine 

OOTFIL 

(a) 

(a) 

(a) 


Randomly selected starting dots 













of small grains (P^) In each cluster. These data were used to calculate two 
different evaluation criteria for each clustered image. These criteria are 
called the variance reduction criterion (R) and the percent of correct classi- 
fication (PCC), using majority rule labeling. 


The R criterion represents the ratio of the variance of a proportion estimate 
based on a stratified random sample allocation (in which strata are the clus- 
ters) to the variance of a simple random sample proportion estimate. The 
equation for this ratio (when samples that are allocated to clusters are pro- 
portional to the size of the cluster) follows: 


c 


z 

i»l 


T 


(1 - 


Pi) 


TT 


( 1 ) 


where 


c ■ total number of clusters 

* total number of pixels in cluster i 


Nj B total number of pixels in the segment 


P^ « the proportion of small grains in cluster i 


P » the overall proportion of small grains in the segment. 

The parameters P^ and P were evaluated using the Accuracy Assessment (AA) digi- 
tized ground- truth data for each segment. 


The PCC criterion measures the proportion of pixels that would be correctly 
labeled or classified if each cluster were labeled by majority rule. The equa- 
tion for computing the PCC criterion may be written as follows: 

where P^, , and are defined above. The first term represents the summa- 

tion over all clusters having P^ > 0.5. These clusters would be labeled “small 
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grains" by majority rule. The second term represents the summation over all 
clusters having < 0.5. These clusters would be labeled "other" by majority 
rule. 

The R criterion serves as a measure of the efficiency of a clustering algo- 
rithm as used in a stratified sampling proportion estimation scheme. The PCC 
criterion, on the other hand, serves as an overall indicator of cluster purity 
and of the quality of a proportion estimate obtained by labeling clusters. 

The results of evaluating these criteria for each of the three clustering 
algorithms as applied to the 25 LACIE segments are given in section 5. 
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3. TECHNIQUES FOR CLUSTER-BASED PROPORTION ESTIMATION 


The objective of performing clustering In the context of Procedure 1 replace- 
ment Is to use the results of the clustering as a basis for obtaining a pro- 
portion estimate for a crop of Interest. In this study, six different tech- 
niques for obtaining proportion estimates by labeling a subset of pixels from 
the Image were explored. Three of these techniques result In a labeling of 
each cluster, whereas the other three produce estimates of the proportion of 
the crop of interest In each cluster. We will refer to the first three tech- 
niques as cluster-labeling techniques and the last three as stratified propor- 
tion estimation techniques. 

The various cluster- labeling techniques differ from one another In the manner 
In which the subset of pixels to be labeled Is selected. In one technique, 
pixels are allocated to each cluster, proportionally to the size of that 
cluster; that 1$, If n^ total pixels are to be labeled, then 

N. 

"l ' fTf "t 

Is the number of pixels to be labeled from each cluster. It should be noted 
that 1f n^ Is not an Integer, It Is rounded up or down. If this produces a 
total number of pixels less than n, the remaining pixels are selected first 
from the largest cluster, then the next largest, continuing In this manner. 
Clusters too small to receive a single pixel are lumped together, and an 
allocation Is made to that lumped group. Following the pixel allocation, 
majority rule may be applied to label the cluster; that Is, If 


1 n^ 

where x^ ■ the number of pixels out of the n^ pixels labeled In cluster 1 
that are the crop of Interest. 


Then the labeling rule Is as follows: 

a. Label cluster 1 as the crop of Interest If 

b. Otherwise, label cluster 1 as being other than the crop of Interest. 
The proportion estimate Is obtained as 


P « 


■1 


L. 


(5) 


The procedure Just described will be called cluster labeling by proportional 
allocation. 


The other two cluster>label1ng procedures tested were developed by M. 0. Pore 
of Lockheed Electronics Company, Inc. (ref. 12). One approach, called cluster 
labeling by sequential allocation, labels pixels, selected at random, from a 
given cluster until a confidence Interval for the estimated proportion of the 
crop of Interest no longer contains one-half. 


The final cluster- labeling approach tested Is called cluster labeling by 
sequential Bayesian allocation. In this approach a Bayesian estimate for Pj*, 
the probability that the true proportion of the crop of Interest Is less than 
or equal to one-half Is developed. The formal equation Is 


P^ « Prob 




f(9^ [x^)de 




r 

7Jo 


f(x^l6^)g(e,)de 


r“"i 


(6) 


where 9^ » the true proportion of the crop of Interest In cluster 1, 
g(9^) • the unknown prior distribution for the 9^'s and as before x^ ■ the 
minber of pixels out of the n^ pixels labelled In cluster 1 that are the crop 
of Interest. 
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The strategy is to select a form for g(9^) and calculate the form of P^. Then 
one may continue sampling at random and labeling the samples selected until 
P^ Is smaller or larger than a fixed threshold. If P^' Is smaller than a, then 
label cluster 1 as other than the crop of Interest. If P[ Is greater than 
1 • a. then label the cluster as the crop of Interest. Thus, In both cluster 
labeling by sequential allocation and cluster labeling by Bayesian sequential 
allocation, labeling from a given cluster continues until a specified confi- 
dence on the label of that cluster Is obtained. The Bayesian scheme uses the 
additional Information of an estimated prior distribution on the true cluster 
purities produced by a given algorithm. The necessary labeling rules and 
equations for these two techniques are developed In (ref. 12) and repeated 
here. 

For cluster labeling by sequential allocation, the labeling rule Is as follows: 

a. Continue labeling If 

X, .(^l.).5343,.^.l.S34S,) 

where 

|X,(|>, - X,) 

or until 35 samples have been allocated. 

b. Otherwise, label by majority rule 

This Interval provides an approximate confidence of 1 - 1/8 ■ 0.B75 In the 
label for each cluster. 

For cluster labeling by sequential Bayesian allocation, the labeling rule Is 
as follows: 

a. Label two pixels from a given cluster. If x^ ■ 0 or 2, stop and label by 
majority rule. Otherwise, go to step b. 

b. Label three more pixels. If x^ • 1 or 4, stop and label by majority rule. 
Otherwise, go to step c. 
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c. 


2 or 5, stop and label by majority rule. 


Label two more pixels. If • 

Otherwise, go to step d. 

d. Label three more pixels. If x^ • 3 or 7, stop and label by majority rule. 
Otherwise, go to step e. 

e. Label three more pixels and label the cluster by majority rule. 

This labeling rule Is derived using a uniform prior for g(e) and also provides 
an approximate probability of correct labeling of 1 • 1/8 ■ 0.875. 

The three techniques for stratified proportion estimation parallel the three 
cluster-labeling techniques just discussed. One possibility Is to allocate 
a total of nj pixels such that each cluster receives an allocation proportional 
to Its size. This proportional allocation Is accomplished as described earlier 
In this section. The proportion estimate Is then computed as 



*1 

The term represents an estimate of the proportion of cluster 1 which Is the 
"l 

crop of Interest. The remaining two techniques for stratified proportion 
estimation differ In the rules used for allocating pixels to cluster and In 
the equation used for obtaining the final estimate. As was the case for clus- 
ter labeling, both techniques are sequential In nature with one employing a 
Bayesian prior distribution. Both techniques were developed by M. 0. Pore 
(ref. 13). 


The concept of sequential sampling as It Is used In these two techniques Is 
to apply Information obtained from previously allocated samples In determining 
which cluster should receive the new sample. Suppose n^ pixels have been 
allocated to cluster 1, and x^ of these pixels are of the crop of Interest. 
Then 






"if - ^) 




( 8 ) 
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where 


Is an estimate of the variance of the usual stratified proportion estimator 
as given In equation (7). Now the estimated expected value of Is (If one 
more sample from the ith cluster Is taken) 



2 

where ♦ 1) Is the variance based on n + 1 total samples If the last 

sample selected Is frcn cluster 1 and Is also the crop of Interest, and 
®n+l^*1^ Is the var1a"ce If the last sample selected Is from cluster 1 and Is 
other than the crop of Interest. 


The expected change In the estimated segment proportion variance due to an 
additional labeled sample from cluster 1 Is then 

“t • “n * <"*> 

Written In terms of the basic variables this equation becomes 





n^ ♦ 3 


1)n^(n^ 


♦ D' 


xj(ni - 


Xi) 


( 11 ) 


The strategy for the first technique, which we shall call stratified propor- 
tion estimation using sequential allocation, Is to first allocate at random 
a fixed number of pixels to each cluster for the purpose of obtaining an Ini- 
tial estimate of the proportion of each cluster which Is the crop of Interest. 

2 

Then Oa. Is computed for each cluster, and the next sample to be labeled Is 
allocated to the cluster with the largest value of This process con- 
tinues until a fixed number of pixels have been labeled. The proportion esti- 
mate Is then 



( 12 ) 


A- 19 


3-5 


Tht last technique, which Is called stratified proportion estimation using 
Bayesian sequential allocation. Is similar to the technique Just described 
except that the additional Information of a prior distribution on cluster 
purities Is used. In this case we use the posterior Bayes estimate 

0^ ■ E^e^ix^) * I Tx j) Bf(x^ (“JS) 
In place of the minimum varl£.*:ce unbiased estimator 


Although 6^ Is not unbiased. It Is the minimum mean*square>error estimator. 
Following an Initial fixed allocation to each cluster, one may then use 0. 

In place of In equations (8) and (9) to calculate for each cluster and 
proceed to allocate sequentially as before. The only difficulty Is In the 
selection of a prior distribution on cluster purities. 


The prior distribution on cluster purities was chosen following an examination 
of the empirical distribution for each of the three clustering algorithms 
on a subset of 10 segments. These histograms representing percentage of clus- 
ters versus ground-truth percentage of small grains are given In figures 3-1, 
.V2, and 3-3. The similarity of these histograms and their general shape led 
to the belief that at least for segments having a moderate to large amount of 
small grains, a prior distribution which was quadratic In form would be 
appropriate. 

It seemed reasonable that the prior distribution, g(e), satisfy the follow- 
ing criteria. 

g(e) > 0 for all 0 < 0 < 1 

/•I (14) 

/ g(0)de ^ 1 

Jo 
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Figure 3-1.- Empirical purity distribution for CLASSY clusters over 10 se^nents coagiared with 

quadratic prior. 



3-2.- Empirical purity distribution for AMOEBA clusters over 10 segments compared with 

quadratic prior. 







o 


ition for ISOCLS clusters over 10 segments compared 
quadratic prior. 






and 


1 

0g(9)d0 = P 

where 



and Is computed following the fixed allocation of pixels to clusters. 



These three conditions allow the specification of the three coefficients in 
the equation 

g(0) = 30^ + b0 + c 


These coefficients are 


a « 6 

b » 12(P - 1) 
c • 5 - 6P 


for 0.211 < P < 0.789 


(15) 


It should be noted that the b and c coefficients are only appropriate for a 
specified range of P values. If ? is not in this range, then g(9) will be 
negative at some point. 


The fact that a quadratic prior is only appropriate over a limited range of 
P values also seemed to be validated by empirical evidence. Figures 3-4 and 
3-5 show histograms of cluster purity for eight segments which had low ground- 
truth proportions of small grains. Clearly a quadratic prior is not appro- 
priate. On this basis, it was decided to select an alternate prior for seg- 
ments which had a small portion of the crop of interest. The prior for 
segments with a very large proportion of the crop of interest might reasonably 
be thought to be like a "flipped" version of the prior for small proportion 
segments. 
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Percent of clu&ters 






It was decided that the form of the prior for small proportion segments would be 


g(e) • - d • 3(S"^ - 1) 

and that this distribution should satisfy the following constraints 

g(e) > ■ 0 for all 0 < 9 < 1 
•1 

g(9)d9 = 1 


(16) 


/: 


g(i) “ 0 


L 


eg(0)de » p 


(17) 


These constraints may be used to determine the parameters a and 8 which are 

1 - 4P ) '' 

a » for 0 < P < 0.25 

1 - 2P ) 

6=^-^ (18) 

This prior will be called the exponential prior. In order to see how well the 
quadratic and exponential priors fit the empirical cluster purity histograms, 
the following calculations were made: 

a. The average ground-truth proportion of small grains in the 10 segments used 
to obtain the data reflected in figures 3-1, 3-2, and 3-3 was computed. 

b. The average ground-truth proportion of small grains in the eight segments 
used to obtain the data reflected in figures 3-4 and 3-5 was computed. 

The first proportion, call it P^, was then used to calculate the coefficients 
a, b, and c [equation (15)] specifying a quadratic prior. This prior is 
plotted in figures 3-1, 3-2, and 3-3 as a smooth curve for comparison with the 
empirical histograms. Similarly, the average ground- truth proportion for the 
eight small proportion segments, call it P2t was used to calculate the coeffi- 
cients a and 3 for an exponential prior. This prior is plotted as a smooth, 
curve on figures 3-4 and 3-5. It is evident from examining figures 3-1 through 
3-5 that both prior distibutions seem to fit the empirical cluster purity dis- 
tributions well. 
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In actual practice, both the sequential and the Bayesian sequential procedure 
were Initiated with random allocation of two pixels per cluster. Following 
this allocation, the Bayesian sequential procedure computes two different 
estimates of the segment proportion. One is given by 


/<k 


P » 



(19) 


whereas the other is the Bayes posterior estimate based on a quadratic pr’or 
and an average proportion estimate of P = 0.34. The equation for this estimate 
Is 


6 = (20) 

where 

«[(x, ♦ 1)(x, ♦ 2)(x, ♦ 3)] ♦ b[(x ♦ 1)(x^ ♦ 2)(n, ♦ 4 )] ♦ c[(x^ ♦ 1)(n, ♦ 3)(n^ ♦ 4 )] 

’ a[{x^ • 1)(*^ ♦ 2)(n^ ♦ 4 )] ♦ b[(x^ ♦ l)(n^ ♦ 3){n^ ♦ 4 )] ♦ c[(n^ ♦ 2){n^ ♦ 3)(n^ ♦ 4 )] 

(21) 


If 0.211 < P, then the quadratic prior Is selected and 6 Is used to reset the 
parameters a, b, and c. Sequential selection then proceeds with 


Aa 


2 

1 


9(n^,x^)[1 - e(n^,x^)3 

L 


e(n^,x.)0(n^ + l,x. + 1)[1 - §(n,. + l,x^ + 1)] 
[1 - 0(n^x^. )]e(n^. + l,x.)[l - e(n. + l,x^.)]j 


( 22 ) 


After a number of dots have been allocated, an overall proportion estimate Is 
obtained via equation (20), using the current values of the 0(n^,x^.) estimates. 
If 0.211 > P» then the exponential prior is used to calculate the parameters a 
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2 

and S. Sequential selection then proceeds with given by equation (22), 
using 


0(n^ ) 



Yl - Y2 


(23) 


where 

+ l)(n^)(n^ - 1) ••• (x^. + 1) 

Y£ ■ (n^ + 1 - a)(n^ - a) ••• (x. + 1 - a) 


After a number of dots have been allocated, an overall proportion estimate Is 
obtained as before using equation (20). 

Figure 3-6 shows a comparison of the quadratic and exponential priors at the 
value P » 0.211, where the switch occurs from one to the other. The curves 
are close enough for this value of P that the decision as to which one to use 
Is not critical. 


Outlined In this section are six different techniques for cluster based pro- 
portion estimation. As a way of summarizing these developments, a brief dis- 
cussion on some of the expected characteristics of these techniques follows. 

Three cluster-labeling and three stratified proportion-estimation schemes have 
been considered. If the clusters are very pure, then cluster labeling should 
produce proportion estimates with small bias and very small variance. In 
addition, relatively few labeled pixels should be required to obtain these 
estimates, and the estimates themselves should not be very sensitive to occas- 
ional labeling errors. Cluster labeling using sequential allocation or Baye- 
sian sequential allocation provides a specified confidence In the labels of 
clusters. These techniques should require fewer dots to be labeled on the 
average than does cluster labeling using proportional allocation. 


3j- 
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quadratic and exponential 


If the clusters are significantly mixed, all of the cluster-labeling schemes 
will suffer. In this case, a more appropriate technique is provided by strat- 
ified proportion estimation. Stratified proportion estimation, using propor- 
tional allocation, provides theoretically unbiased estimates. The stratified 
proportion estimation, using sequential and Bayesian sequential allocation, 
are not theoretically unbiased but should produce estimates with a lower mean- 
square error for a given number of dots allocated than the proportional allo- 
cation approach. Both of the sequential techniques incorporate information 
about both the size and the estimated purity of clusters in performing the 
dot allocation. 








4. DATA SET AND EXPERIMENTAL DESIGN 

The data set for this study consisted of 25 LACIE segments selected at random 
from the Phase III (1976*1977) blind site data base. Eighteen of the segments 
are the same as those used In the secondary error analysis study (refs. 2 
and 3). Seven substitutions In the secondary error analysis data set were 
necessary because the original segments were not well registered to the digi- 
tized ground truth. The segments selected represent a cross section of the 
U.S. Great Plains. Both winter- and spring-wheat segments were Included. 

Three segments In the data set were discovered to have significant amounts of 
strip fallow small grains where the strips were not resolved In the ground 
truth. These segments, 1648, 1739, and 1544, were clustered but were not eval- 
uated using the proportion-estimation schemes because reliable labels were 
not available for the strip fallow area. One other segment, 1079, was not 
evaluated using the proportion-estimation schemes because It was found to con- 
tain 27 percent abandoned winter wheat and was, thus, a very atypical segment. 
In table 4-1 Is a listing of the 21 segments actually used In the testing, 
their location, the acquisitions used, and the proportion of small grains from 
the digitized ground truth. 

The experimental design for the evaluation of the six proportion-estimation 
techniques was that each of them were evaluated on a subset of five seg- 
ments selected from the set of 21 acceptable segments. The subset that was 
selected consisted of segments 1005, 1853, 1520, 1231, and 1060. After eval- 
uating these preliminary results, the most promising techniques were selected 
and run on the remainder of the 21 segments. 

Each proportion-estimation technique - clustering algorithm combination -> was 
repeated 100 times for each segment. Each repetition used a different pseudo 
random sequence In selecting pixels. Thus, It was possible to calculate the 
average bias In the proportion estimate, the mean-square error of the esti- 
mate, and the R factor as compared to simple random sampling. These results 
are reported In the appendix. Averages and variances of these results over 
segments were also calculated. These results appear In section 5. 
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TABLE 4-1 DESCRIPTION OF THE TWENTY -ONE SEGMENTS USED IN THE STUDY 


Segment 

Location 

Acquisitions used 

Ground-truth 
proportion of 
small grains 

100S (U) 

Cheyenne, Colorado 

7177, 7159, 6326, 6254 

0.348 

1032 (W) 

Wichita, Kansas 

7194, 7086, 6326. 6254 

.371 

1033 (U) 

Clark, Kansas 

7156. 6288 

.095 

1853 (W) 

Ness, Kansas 

7193, 7067, 6253 

.306 

1166 (W) 

Lyon, Kansas 

7190, 7154, 7082. 6286 

.066 

1512 (S) 

Clay, Minnesota 

7193, 7156 

.340 

1520 (S) 

Big Stone, Minnesota 

7174, 7156, 7120 

.301 

1577 (M) 

Platte, Nebraska 

7120, 6306 

.029 

1604 (S) 

Renville, North Dakota 

7143, 7125 

.524 

1606 (S) 

Ward, North Dakota 

7197, 7125 

.330 

1661 (S) 

McIntosh, North Dakota 

7159, 7123 

.414 

1899 (S) 

Walsh, North Dakota 

7193, 7175, 7157, 7122 

.596 

1231 (W) 

Jackson, Oklahoma 

7156, 7066, 6288 

.744 

1239 (W) 

Noble, Oklahoma 

7155, 7082, 6268 

.167 

1367 (W) 

Major, Oklahoma 

7155, 7101. 6287 

.606 

1675 (S) 

McPherson, South Dakota 

7230, 7176, 7123, 6254 

.291 

1686 (S) 

Beadle. South Dakota 

7194, 7140, 6307, 6254 

.194 

1803 (W) 

Shannon, South Dakota 

7178, 7159, 7123, 6255 

.032 

1805 (M) 

Gregory, South t)akota 

7211, 7158. 6307, 6290 

.164 

1059 (U) 

Ochiltree. Texas 

7157, 7121, 6325, 6307 

.437 

1060 (W) 

Sherman, Texas 

7158. 7068 

.231 


Symbol definition: 


M > Mixed 
S ■ Spring wheat 
W ■ Winter wheat 
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5. RESULTS 


The results of the study are suinmarlted in two parts. The first part pertains 
to the evaluation of the clustering algorithms* and the second part Is an 
evaluation and comparison of the six techniques for proportion estimation. 

The R* as compared to simple random sampling, and the PCC, using majority rule 
labeling, are given in table 5-1 for each of the three algorithms tested as 
applied to each of the 21 segments. Averages for each measure over segments 
are given at the bottom of the table along with an estimate of the standard 
deviation over segments. None of thri 'averages are significantly different. 

In fact, it is striking how similar the average results are in view of the 
differences in the algorithms. This similarity will be further discussed in 
section 6. 

One significant difference is in the number of clusters produced by each algo- 
rithm. At the bottom of table 5-1, the average number of clusters and the 
standard deviation in the number of clusters are indicated. The average number 
of clusters nearly doubles when going from CLASSY to AMOEBA and doubles again 
in going from AMOEBA to ISOCLS. Economy in the number of clusters produced 
is generally considered a distinct advantage for a clustering algorithm. It 
is clearly an advantage in the stratified proportion-estimation techniques. 
Indeed the sequential stratified techniques require that a fixed number of 
pixels (usually 2) be allocated to each cluster initially. Thus, a large 
number of clusters means that a large number of pixels must be allocated 
before sequential allocation even begins. 

Presented in tables 5-2, 5-3, and 5-4 are the results for the three cluster- 
labeling schemes; and in tables 5-5, 5-6, and 5-7 are the results for the 
three stratified proportion-estimation schemes. The results presented in each 
table are averages and variances over the segments processed for each of the 
measures recorded, using a given scheme. For each scheme, with the exception 
of stratified proportion estimation using proportional allocation, the meas- 
ures recorded were the average bias, the mean-square error, and the reduction 
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TABLE 5-1 PCC VALUES USING MAJORITY RULE LABELING AND 
R VALUES FOR CLASSY, AMOEBA, AND ISOCLS 


Stgnitnt 


lOOS 

032 

033 
8$3 

166 I . 

151? (S) 
1520 (S) 
1577 (W) 
1604 (S) 
1606 ($) 
1661 (S) 
1899 (S) 
1231 (W) 
1239 (U) 
1367 (W) 
1675 (5) 
1686 (S) 
1803 (W) 
1805 (M) 

1059 (U) 

1060 (U) 


Avtr«g« 


Stand«rd 

dtvUtlon 


Avtragt numbir 
of el us tors, 

* 1 standard 
iivlatlon 


classy 


AMOEBA 


ISOCLS 


PCC 

R 

PCC 

R 

PCC 

0.8398 

0.9u71 

0.9132 

0.6372 

0.8659 

.8975 

.3450 

.8541 

.4585 

.8367 

.9050 

.6208 

.9151 

.7363 

.9247 

.8948 

.4073 

.7926 

.6966 

.8859 

.9333 

.8287 

.9388 

.7857 

.9386 

.7110 

.8269 

.7621 

.7481 

.7576 

.6361 

.5758 

.8522 

.5213 

.8546 

.9678 

.9055 

.9678 

.9076 

.9684 

.6877 

.8419 

.7318 

.7538 

.6749 

.8229 

.6071 

.8002 

.6511 

.7958 

.7260 

.7395 

.7523 

.6745 

.7184 

.8427 

.4852 

.8555 

.4684 

.8426 

.8773 

.4849 

.8926 

.4450 

.8788 

.8508 

.7175 

.8702 

.6586 

.8601 

.8023 

.5654 

.8198 

.5644 

.8051 

.7929 

.7056 

.8060 

.6243 

.7890 

.6352 

.7847 

.8485 

.6933 

.8400 

.9681 

.8313 

.9701 

.7339 

.9733 

.9052 

.5007 

.9199 

.4680 

.9219 

.8448 

.4515 

.8667 

.4126 

.8768 

.8583 

.5984 

.8824 

.5227 

.8757 

.8476 

.6472 

.8521 

.6268 

.8488 

.0754 

.1663 

.0688 

.1333 

.0771 


9.32 12.15 17.46 1 10.15 36.84 i 2.32 





















TABLE 5-2.- MAJORITY RULE LABELING USING PROPORTIONAL ALLOCATION RESULTS FOR FIVE SEGMENTS 
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TABLE 5-3.- MAJORITY RULE LABELING USING SEQUENTIAL ALLOCATION RESULTS FOR 
FIVE SEGMENTS. THREE-PIXEL PER CLUSTER INITIAL ALLOCATION 




TABLE 5-4.- MAJORITY RULE LABELING USING BAYESIAN SEQUENTIAL ALLOCATION RESULTS FOR 
FIVE SEGMENTS. TWO-PIXEL PER CLUSTER INITIAL ALLOCATION 
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e error 

0.00000062 
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0.13923180 

23.486 



CO 


O 











o 


VO 



CO 


OY 


r^. 


CT 







CO 


CT 



CO 


lO 


CO 


• 

o 




CM 


CM 



o 


CSJ 


o 


ir 


CM 

uo 


o 

L. 

o 


VO 





• 

O 

• 


• 





o 

u 

o 







1 



C 







0) 


•p» 









w 


44- 




00 

o 

o 

c o 

OY 

O ^ 



l/> 


u 

in 

O W 


a 



(O 


to 

lO 

•f- u 

CM 

t. -4-1 


< 



3 

CM 

4-> 0) 


O to 


03 


vO 

or 

00 

u 

VO 

^ u 

o 

LU 


00 


lO 

3 O 

in 

E O 

• 


0) 

oo 

1 

o 

"O w 

00 

3 i- 

cn 

y 

cn 

o 

C 

o 

O (O 

CO 

C »— 





M9 

• 

U 3 

• 





o 


o 

O* 


<D 



0) 

1 

E 


0> 


CT 4/1 



> 




OY 1 


« 1— 



< 


at 


c 


V. a 




fs. 

o 

o 

i. to 

o 

0^ X 




U3 

fO 

lO 


00 

> •»- 




m 

s. 


> £ 

CM 

< o. 

o 

>- 



0> 


< 

00 


cn 

4/3 



> 

o 


o 


CT 

4/T 


CM 

< 

VO 




• 

c 


po 


o 




CT 



o 


o 


OY 


CM 

o 


• 


« 


• 





o 

1 


o 


o 




5-5 


A-39 




f ' 





TABLE 5-5.- STRATIFIED PROPORTION ESTIMATION USING PROPORTIONAL ALLOCATION 

RESULTS FOR TWENTY-ONE SEGMENTS 
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TABLE 5-6.- STRATIFIED PROPORTION ESTIMATION USING SEQUENTIAL ALLOCATION RESULTS FOR 
FIVE SEGMENTS, THREE-PIXEL PER CLUSTER INITIAL ALLOCATION 









TABLE 5-7.- STRATIFIED PROPORTION ESTIMATION USING BAYESIAN SEQUENTIAL ALLOCATION 
RESULTS FOR TWENTY-ONE SEGMENTS, TWO-PIXEL PER CLUSTER INITIAL ALLOCATION 
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In mean-square error as compared to simple random sampling. Because stratified 
proportion estimation (using proportional alldoatlon) Is theoretically unbiased, 
the bias was not recorded; the variance' and the R, rather than the mean-square 
error and reduction In mean- square error, were recorded. The techniques using 
sequential allocation for majority-rule labeling did not allocate a fixed num- 
ber of pixels, and hence, only the average number of pixels allocated Is 
reported. The sequential Bayesian technique used an Initial allocation of two 
pixels per cluster, whereas the sequential technique without prior used a 
three-pixel cluster Initial allocation. The same Initial allocation was used 
for the Bayesian and “no prior" sequential techniques that were used In strat- 
ified proportion-estimation. The missing values In tables 5-6 and 5-7 Indicate 
that In some cases sequential allocation could not begin until a larger number 
of dots had been allocated. 

After examining the results for the subset of five segments, it was clear that 
all of the cluster-labeling schemes as well as the stratified proportion esti- 
mation using sequential allocation were not competitive with stratified pro- 
portion estimation using either proportional allocation or Bayesian sequential 
allocation. This Is most readily apparent in a comparison of the reduction in 
mean-square error or R results. 

The technique using sequential allocation in obtaining stratified proportion 
estimates does look competitive at an allocation of 30 pixels. Because It 
was not significantly better than stratified proportion estimation using 
Bayesian sequential allocation. It was decided to place the most emphasis on 
a comparison of the Bayesian sequential and the proportional allocation tech- 
niques as used in obtaining stratified proportion estimates. Consequently, 
tables 5-5 and 5-7 represent results for the full 21 segments, whereas 5-2, 

5-3, 5-4, and 5-6 represent the results for five segments. 

Figures 5-1 and 5-2 are a presentation In histogram form of the same data 
which are summarized In tables 5-5 and 5-7. Figure 5-3 Is a comparative histo- 
gram plot of R values for Procedure 1, which are reported In reference 3. In 
this plot. It Is assumed that there Is an allocation of pixels equal to the 
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Figure 5-1.- Histogram plots of the R for 

using proportional 
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Figure 5-2.- Histogram plots of the reduction in mean-square 
error for stratified proportion estimation using Bayesian 
sequential allocation. 


t 1 1- t » 1 I 

0 0.4 0.8 1.2 

Procedure 1 

Figure 5-3.- Histogram plot of the R for Procedure 1 
based on approximately 60 pixels (type 2) per 
estimate. 

number of type 2 dots used in each estimate UpPf'oximately 60 pixels). The 
complete data for each of the six proportion-estimation techniques studied are 
in the appendix of this report. 

The results in table 5-5 are essentially an empirical verification of the 
results in table 5-1. In particular, the R averages may be compared. In 
theory, the R (using this technique) should be independent of the number of 
dots allocated. Indeed, there are no significant differences among the values 
of average R calculated for 30, 60, 90, or 120 dots. In addition, the averages 
for each algorithm tend to agree well with the theoretical average R values 
appearing in table 5-1. 

In examining table 5-7, it is clear that the Bayesian sequential allocation 
technique, as used in obtaining stratified proportion estimates, has an ex- 
tremely low bias for all three algorithms even though the procedure itself is 
not theoretically unbiased. None of the average bias results in this table 
for any of the algorithms are significantly different from zero. 

A comparison of the average reduction in mean-square error for the Bayesian 
sequential allocation technique (table 5-7) with the average R for the pro- 
portional allocation technique (table 5-5) shows that using the Bayesian 
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sequential approach with the CLASSY algorithm gives results which are consis- 
tently lower than proportional allocation for all numbers of pixels allocated. 
If the variances for each technique-algorithm combination are pooled over the 
various numbers of pixels allocated, the results are given In table 5-8. 


TABLE 5-8.- POOLED VARIANCES FOR SEQUENTJ^ ALLOCWION TECHNIQUES 


Pool 

Variances 

Bayesian sequential 
allocation 

Proportional allocation 

CLASSY 

AMOEBA 

ISOCLS 

CLASSY 

AMOEBA 

ISOCLS 

0.038699 

0.079350 

0.019605 

0.036897 

0.024976 

0.033507 


In table 5-9 are the least significant differences (LSD) for comparisons 
between the two sequential techniques within the results for a given family. 
The LSD Is computed as 



(24) 


where S^ and $2 are the pooled variance estimates of the groups to be compared 
and t Is the 0.975 percentage point of the Student's-t distribution with 
80 degrees of freedom « 1.99. 


] 


TABLE 5-9.- LEAST SIGNIFICANT DIFFERENCES FOR COMPARISONS BETWEEN 
BAYESIAN SEQUENTIAL AND PROPORTIONAL ALLOCATION TECHNIQUES 
FOR STRATIFIED PROPORTION E STIMATI ON 


LSD In 
R values 

CLASSY 

AMOEBA 

ISOCLS 

0.119397 

0.140262 

0.100078 


5-13 


I 

A-47 

5c 

, 

I 
























The differences between the corresponding R values for tables 5>S and 5*7 are 
given In table 5-10. 

TABLE 5-10.- VALUES FOR sequential 


Pixels 

CLASSY 

AMOEBA 

ISOCLS 

30 

®0. 200682 

**-0.140867 


60 

**0.119384 

-.086566 


90 

*0.168540 

-.066167 

*0.182187 

120 

**0.116789 

-.075886 

**0.096402 


*S1gn1f leant at the 0.05-percent level. 

^Marginally significant at the 0.05-percent level. 

An examination of table 5-9 shows that the CLASSY results for each number of 
pixels and the ISOCLS results for 90 and 120 pixels are either significant or 
very nearly significant at the 0.05-percent level. ISOCLS results are not 
available for 30 and 60 pixels as there were more pixels than 60 allocated 
following the two-pixel per cluster allocation In the Bayesian sequential pro- 
cedure. The AMOEBA results for the Bayesian procedure are consistently higher 
than for the proportional allocation procedure, and In the case of 30 pixels 
allocated, the reduction In mean-square-error value was significantly higher. 
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6. CONCLUSIONS AND RECOMMENDATIONS 


The clustering algorithms CLASSY, AMOEBA, and ISOCLS performed comparably with 
respect to the PCC using majority-rule labeling and the R measures. The fact 
that the average results for all three algorithms were so similar and that the 
average R value for Procedure 1 has been reported in several Independent 
studies to be about this same value (0.65 - 0.70} suggests there is a funda- 
mental limitation in the separability of the data which precludes better per- 
formance. This idea should be tested further in later studies. The fact that 
CLASSY had, on the average, only about 9 clusters, whereas AMOEBA had about 
17, and ISOCLS had almost 37 is seen as Important. Given the same overall 
level of performance, an economy in the number of clusters produced is to be 
preferred. 

The cluster-labeling techniques appear to suffer from the same fate. The pro- 
portion estimates obtained using these techniques were generally biased; the 
R-values were always greater than 0.9 and typically they were gre..ter than 1. 
This poor performance for all of the clustering algorithms Indicates that 
clusters were simply not pure enough for cluster labeling to function effi- 
ciently as a proportion-estimation technique. For all three clustering algo- 
rithms, the average PCC value, which may be thought of as a measure of cluster 
purity, was about 0.85. Apparently, much greater cluster purity Is needed for 
cluster labeling to be a viable approach. 

The stratified proportion-estimation techniques generally worked well. The 
sequential allocation approach with no prior distribution on cluster purities 
produced good results for an allocation of 30 pixels; however, the results for 
allocations of 60, 90, and 120 pixels were biased and had much larger reduction 
in mean-square error values for all of the clustering algorithms. In addi- 
tion, these results were obtained with an initial allocation of three pixels 
per cluster, which means that in many cases, sequential allocation did not 
begin until more than 30 pixels had been allocated. 

The study eventually focused on a comparison of the Bayesian sequential allo- 
cation technique and the proportional allocation technique for stratified 


proportion tstimatlon. Both of these techniques are unbiased. The propor* 
tional allocation technique has an R value of about 0.67 which does not differ 
significantly from algorithm to algorithm or for different numbers of pixels 
allocated. This result Is also not much different from the Procedure 1 value. 
However, the Bayesian sequential allocation technique, when used with the 
CLASSY or ISOCLS clustering algorithm, has significantly lower reduction In 
mean* square-error values than does proportional allocation. The fact that 
CLASSY has many fewer clusters than ISOCLS and, thus. Is able to begin allo- 
cating sequentially at a much lower number of dots makes It the preferred 
algorithm. 

The recommendation of this report Is that studies be undertaken to determine 
how best to Implement stratified proportion estimation using CLASSY clusters 
as the strata and the Bayesian sequential technique for pixel allocation. It 
i'ppears that a total allocation of 30 pixels would achieve the minimum R. The 
average mean-square error for this number of pixels Is 0.002853, which com- 
pares very favorably with the average variance of 0.002515 calculated from 
the results of the Procedure 1 secondary error analysis study (ref. 3). This 
variance for Procedure 1 was obtained with about 100 labeled pixels for each 
estimate (: 40 type 1 pixels plus : 60 type 2 pixels). Thus, an allocation 
of only 30 total dots represents a very clear advantage for the proposed 
replacement procedure for Procedure 1. 
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APPENDIX 


CALCULATION RESULTS OF THE AVERAGE BIAS IN THE PROPORTION ESTIMATE, 
THE MEAN-SQUARE ERROR OF THE ESTIMATE, AND THE VARIANCE REDUCTION 
FACTOR AS COMPARED TO SIMPLE RANDOM SAMPLING 
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EVALUATION OF BAYESIAN SEQUENTIAL PROPORTION ESTIMATION 
USING ANALYST LABELS* 

By R* K. Lennington and K. M. Abotteen 
1. INTRODUCTION 

A previous study by R. K. Lennington and J. K. Johnson (ref. 1) concluded by 
recomnending a new procedure for crop proportion estimation. The procedure 
consisted of two steps. First, the Landsat data were to be clustered using 
the CLASSY clustering algorithm. Then, picture elements (pixels) were to be 
allocated to each cluster strata and labeled using a sequential Bayesian allo- 
cation scheme developed by M. D. Pore (ref. 2). The labeled pixels were used 
to form a posterior distribution Bayes estimate of the proportion of the class 
of interest. In tests involving ground- truth data from 21 blind sites used in 
Phase III of the Large Area Crop Inventory Experiment (LACIE), this procedure 
was unbiased and had an estimated mean squared error (MSE) approximately equal 
to that of a procedure called Procedure 1 (which is based on the sampling of 
individual pixels) and uses only one-third of the total number of labeled 
pixels (ref. 1). 

In order to explore the feasibility of the new procedure in an actual labeling 
situation and to perform a preliminary evaluation of its characteristics using 
analyst labels, a test Involving ID Phase III segments was undertaken. 

Section 2 describes the procedure used for selecting pixels to be labeled and 
the method for obtaining proportion estimates. The data set used in the 
experiment is described in section 3, while the results pertaining to the 
accuracy of the analyst labels and the bias and MSE of the proportion esti- 
mates obtained using these labels are described in section 4. Section 4 also 
presents the conclusion and recommendations. 


Published by Lockheed Engineering and Management Services Company, Inc., 
LEMSCO-14355, NASA/JSC (Houston), April 1980. 
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2. LABELING PROCEDURE 


For the purposes of this test, the Bayesian sequential allocation procedure 
was Implemented on a Texas Instruments TI159 programmable calculator. The 
version of the allocation procedure Implemented was slightly different from 
the procedure used In the previous study (ref. 1) In that a beta distribution 
was used for the prior distribution of cluster purities rather than a 
quadratic or exponential distribution. The form of the distribution used was 
as follows. 


9 ( 9 ^) 




a-1 


(1-0J 


b-1 


( 1 ) 


where 
b « 1 

A 

a » — ^ 

1-P 

A 

p ■ the estimated proportion of the class of Interest In the whole 
segment 

■ the proportion of the class of Interest In cluster 1 
g » the prior distribution of cluster purities 

The choice of the parameters a and b ensures that the mean of the distrl- 

A 

butlon will be p. The parameter b was chosen to be fixed at a value of 1 
because that value seaned to give the best fit to the previously obtained 
empirical prior distributions (ref. 1). Initially, the parameter a was 
chosen to be 0.515, corresponding to a p of 0.34. 

The beta prior distribution, although not Identical to the prior distributions 
used in the previous study, 1s not greatly different and does offer some 
advantages. It may bp used over the entire range of segment proportions; 
hence, the use of a prior distribution for large proportion segments and 
another for small proportion segments Is unnecessary. Also, the similarity of 
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the beta distribution to the binomial distribution allows the calculation of 
the Bayes posterior distribution estimator for and the expressions for the 
bias and variance of this estimator with comparative ease. In fact, the beta 
distribution Is called a "natural conjugate prior distribution" to the binom- 
ial distribution for this reason. In addition, tests performed subsequent to 
the work reported In reference 1 showed that use of the beta prior distribu- 
tion with ground-truth labels produced results which were at least as good as 
those produced using the combination of a quadratic and exponential prior 
distribution. 


i 


.{ 


i 

i 


Using the beta prior distribution for the Bayes posterior distribution 
estimator for becomes 


* a 

A s ' 

1 n^ + a + b 


( 2 ) 


where 

n^ > the total number of pixels sampled from cluster 1 

X.| > the number of sampled pixels which belong to the class of Interest 

The bias and MSE of this estimator are 


Blas^ 


E(e^ - 


a(l-e^) + be^ 
n^ + a + b 


(3) 


MSE^ 


n^e^d - e^) ♦ [a(l - e^) - be^]^ 

■ ■ 5 

(n^ + a + b) 


where E > the expected value operator. 


(4) 


The allocation procedure begins with the allocation of two random pixels to 

A 

each cluster. At this point, p Is calculated as 



(5) 
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where 


• the number of pixels In cluUer 1 

• the total ni^ber of pixels In the sepent 
c ■ the number of clusters 

The parameter a Is then reset using the equation 

a ■ — 2-r 

1 - p 

At this point* the sequential allocation of pixels begins. Succeeding pixels 
are allocated to clusters which will minimize the expected value of an esti- 
mator of the overall MSE for the sepent proportion estimate p. 


A 

The MSE for p may be written as 

A 

By using In place of 6^ In equation (4)* MSE^ may be estimated. We will 
denote this estimator as MSE^(x^,n^). 


The expected reduction In the estimated MSE by labeling another pixel from 
cluster 1 becomes 


AMSEi “(JJ^lMSE^Cx^.n^) - [e^ MSE^(x^ + l.n^ + 1) 

♦ (1 - e^)MSE^(x^n^ ♦ 1)]| 


( 7 ) 


Thus* each successive pixel Is chosen at random from the cluster having the 
largest value of oMSE^ . 


In practice* the CLASSY clustering algorithm was first run on a given 
sepent. Then each of the 209 grid Intersection pixels was associated with 
the cluster In which It was placed* and the grid Intersection pixels falling 
In each cluster were listed In a randomized order. The randomized list also 


contained the label of each pixel that had been previously labeled by an 
analyst and Indicated whether the labeled pixel was a type I or type II dot. 

In selecting pixels from clusters* the first to be selected from the random- 
ized list were the type II dots for which analyst labels were available. When 
these pixels were exhausted* others were chosen according to the randomized 
order within clusters. If a type I dot fell In this sequence* Its label was 
used. Dots other than type I were labeled by one of the authors (K. Abotteen) 
using standard analyst procedures. A total of 45 pi:<e1s were allocated and 
labeled for each segment. 

3. DATA SET AND EXPERIMENTAL DESIGN 

The data set for this experiment consisted of 10 phase III blind sites chosen 
as a subset of the 21 segments used In the previous study (ref. 1). These seg- 
ments were chosen to be representative of the previously used* larger data set 
with regard to geographical location and range of segment proportions of small 
grains. These segments and acquisitions along with their location and the 
ground- truth proportion of small grains In each sepent are given In table 1. 

The experimental design consisted of selecting and labeling 45 grid Inter- 
section dots from each sepent. Repeated processings were not attempted due 
to the limited number of analyst labels available. 

4. RESULTS 

This study provides the data for answering two Important questions relative to 
the use of analyst labels with the Bayesian sequential allocation procedure. 

The first question concerns analyst accuracy In labeling pixels. Since In the 
Bayesian sequential procedure more pixels are allocated to mixed clusters* It 
was thought that the analyst labeling accuracy might decrease. The second 
question concerns the bias and MSE of the proportion estimate resulting from 
the procedure as compared to the bias and MSE of a simple random sample of the 
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TABLE 1.- DESCRIPTION OF THE DATA SET 


Segment 

Location 

Acquisitions used 

Ground'truth 
proportion of 
small grains 

1005(w) 

Cheyenne. Colorado 

7177. 7159. 6326, 6254 

0.348 

1033(w) 

Clark. Kansas 

7156, 6288 

.095 • 

1060(w) 

Sherman. Texas 

7158, 7068 

.231 

1231 (w) 

Jackson. Oklahoma 

7156, 7066, 6288 

.744 

lS20(w) 

Big Stone. Minnesota 

7174. 7156, 7120 

JOl 

1604(s) 

Renville. North Dakota 

7143, 7125 

.524 

1675(s) 

McPherson. South Dakota 

7230, 7176. 7123. 6254 

• .291 

1803 (w) 

Shannon. South Dakota 

7178, 7159, 7123, 6255 

.032 

1805(m) 

Gregory. South Dakota 

7211, 7158, 6307, 6290 

.164 

1853 (w) 

Ness. Kansas 

7193, 7067, 6253 

.306 


Symbol definition: 

w ■ winter wheat 
s ■ spring wheat 
m > mixed wheat 



sam« sl:e. Analyst accuracy will be examined first, followed by results 
concerning the proportion estimate Itself. 

Table 2 shows the error rate In labeling smtM grains (percentage of ground- 
truth small grain pixels labeled "other") and the error rate In labeling 
"other" (percentage of ground- truth "other" pixels labeled small grains) for 
the 4$ pixels that were sequentially allocated to each se^nent. The corres- 
ponding error rates for the type II dots that are selected as a simple random 
sample are also given. It should be noted that In every case the error rate 
In labeling small grain pixels was lower for the sequentially allocated pixels 
than for the type II dots. The error rate In labeling "other" pixels was 
lower In two cases for the sequentially allocated pixels; however, the error 
rate In labeling "other" pixels was generally fairly low for both types of 
allocations. 

As another test, one may examine the total number of labeling errors using a 
sequential Bayesian allocation and compare this to the expected total number 
of errors based on the error rate for the type II dots. The expected number 
of errors was calculated by multiplying the total error rate calculated from 
the type II dots by 45. These data are given In table 3. A chi-square test 
of these observed and expected number of errors yields a value of 

• 14.811 

With 9 degrees of freedom, the 5 percent significance level of the random 
variable Is 16.9. Hence, at t!ns lovel of significance, we fall to reject the 
hypothesis that the observed ::umber of errors are not different than the 
expected number of errors based on the simple random sample of type II dots. 

It should be noted that the chi-square test may fall to hold si three of 
the segments have an expected number of errors less than five. However, the 
test may be taken as an Indication of very little difference In the error 
rates for the two labeling procedures. 
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TABLE 3.- OBSERVED AND EXPECTED TOTAL 
NUMBER OF ANALYST LABELING ERRORS 


Segment 

Total nunber of errors 

Observed^ 

Expected^ 

1005 

10 

9.135 

1033 

8 

5.265 

1060 

6 

3.015 

1231 

2 

4.635 

1520 

8 

5.985 

1604 

16 

15.750 

1675 

6 

8.235 

1803 

3 

0.765 

1805 

5 

3.690 

1853 

7 

5.265 


^Number of errors observed out of 45 
sequentially allocated pixels. 

^Number of errors expected based on 
the error rate on the type II dots. 






Regarding the actual proportion estimates, table 4 shows the posterior distri- 
bution Bayes proportion estimates produced following the sequential allocation 
of 4S pixels, the proportion estimates based on the type II dots used as a 
simple random sample, and the Phase III Procedure I estimates. The deviation 
of each of these estimates from the ground- truth proportion of small grains 
for each segment also appears In this table. 


Several observations may be made from table 4. First, the average bias com- 
puted over segments Is smaller for the Bayesian sequential estimates than for 
the simple random sample estimates or the Procedure I estimates. Thus, the 
Bayesian sequential estimates appear to be somewhat less sensitive to the 
effects of analyst bias. Also, the MSE computed over segments Is smaller for 
the Bayesian sequential procedure than for the other two procedures. In fact. 
If we correct the MSE for the type II dot estimates and the Procedure I esti- 
mates to reflect an average sample size of 45 pixels rather than the average 
sample size of 63.5 or 105.5 pixels as given In table 4, we obtain 

"StTyp. II adjusted ■ ^r t-MlMZS) • 0.0166970 
«*PI adjusted ■ (.0126021) . 0.0295449 


These values, when compared to the MSE for the Bayesian sequential procedure, 
yield the following reduction in MSE values. 


^^^Bayes Seq 
^^^Type II adjusted 


0.5137 » Rj 


MSE 




Bayes Seo 


PI adjusted 


0.2903 • Rg 


The reduction In the MSE for the type II dots, Rj, Is very close to the value 
reported In reference 1 for the reduction in the MSE of the Bayesian sequen- 
tial procedure as compared to a simple random sample of the same size using 
ground-truth labels. Both Rj and R 2 represent very favorable reductions In 
MSE values and tend to validate the results of the previous study obtained 
using the ground truth. 
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miH PROPORTION ESTIMATES USING 
DIFFERENT PROCEDURES 























5. CONCLUSIONS AND RECOMMENDATIONS 


This study indicates that the Bayesian sequential dot allocation and propor- 
tion estimation procedure does not significantly increase the analyst labeling 
error rate. In addition, as compared to a simple random sample, the procedure 
reduces the MSE by a factor of two. When compared to Procedure I, it reduces 
the MSE by a factor of approximately three. These results validate the advan- 
tages to be obtained in using this procedure with analyst labels. 

The fact that the procedure was implemented on a small programmable calculator 
indicates that it is operationally feasible. However, it should be mentioned 
that the dot selection part of the program was slower than the normal analyst 
dot-labeling rate. Another yet- to-be-resolved issue is the development of a 
technique for selecting pixels from clusters without revealing to the analyst 
the identity of the cluster in which the pixels fall. It is felt that the 
knowledge that pixels fall in the same or different clusters may bias the 
analyst decision. One obvious solution to the computer-time problem and the 
cluster identity problem would be to implement the procedure on a main-frame 
computer with interactive analyst access via a terminal. Using this approach, 
the cluster identities of all the grid intersection pixels could be retained 
in the computer and therefore would not have to be revealed to the analyst. 

A larger computer should also be able to select pixels faster than an analyst 
can label them. 

In conclusion, it is recommended that steps be initiated for incorporating 
this procedure in a large-scale test using fully developed analyst procedures. 
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